fast unlock in contention by BusyJay · Pull Request #461 · Amanieu/parking_lot

BusyJay · 2025-05-07T08:15:45Z

During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot.

One shortcoming is that fair unlock is now required be invoked explicitly.

This is an improvement to #418.

During contention, almost all threads are active on CPU, unlock them fast can make those threads make progress more quickly. This help improve global throughput in high contention a lot. One shortcoming is that fair unlock is now required be invoked explicitly. This is an improvement to Amanieu#418. Signed-off-by: Jay <BusyJay@users.noreply.github.com>

BusyJay · 2025-05-07T08:20:14Z

Running cargo run --bin mutex --release -- 9:36:9 5 5 2 2:

Running with 9 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	477.862 kHz	478.851 kHz	11.738 kHz
parking_lot::Mutex (master)	364.841 kHz	365.127 kHz	11.269 kHz
std::sync::Mutex	769.754 kHz	767.714 kHz	23.908 kHz
pthread_mutex_t	982.966 kHz	989.991 kHz	31.816 kHz

Running with 18 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	219.059 kHz	218.991 kHz	4.160 kHz
parking_lot::Mutex (master)	82.786 kHz	82.975 kHz	2.435 kHz
std::sync::Mutex	389.199 kHz	394.549 kHz	21.358 kHz
pthread_mutex_t	482.005 kHz	489.225 kHz	26.404 kHz

Running with 27 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	164.219 kHz	164.298 kHz	1.971 kHz
parking_lot::Mutex (master)	28.246 kHz	28.306 kHz	0.443 kHz
std::sync::Mutex	280.553 kHz	280.014 kHz	10.115 kHz
pthread_mutex_t	311.815 kHz	311.582 kHz	8.409 kHz

Running with 36 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	112.111 kHz	112.127 kHz	0.856 kHz
parking_lot::Mutex (master)	22.055 kHz	22.059 kHz	0.150 kHz
std::sync::Mutex	193.672 kHz	195.078 kHz	10.369 kHz
pthread_mutex_t	224.436 kHz	225.767 kHz	12.334 kHz

BusyJay · 2025-05-07T08:28:36Z

Running cargo run --bin rwlock --release -- 36 9 5 5 2 2

parking_lot::RwLock (this pr) - [write] 1102.323 kHz [read] 2943.833 kHz
parking_lot::RwLock (master) - [write] 628.062 kHz [read] 954.938 kHz
seqlock::SeqLock - [write] 648.979 kHz [read] 152225.000 kHz
pthread_rwlock_t - [write] 1678.253 kHz [read] 376.558 kHz

This reverts commit d43aee1. Signed-off-by: Jay <BusyJay@users.noreply.github.com>

Signed-off-by: Jay <BusyJay@users.noreply.github.com>

BusyJay · 2025-05-07T10:11:39Z

Reimplement the PR by maintaining parked bit on waker side, new implementation is less error-prone and work with CondVar directly.

Benchmark shows even more positive results:

Running cargo run --bin mutex --release -- 9:36:9 5 5 2 2:

Running with 9 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	405.134 kHz	406.469 kHz	9.105 kHz
parking_lot::Mutex (master)	364.841 kHz	365.127 kHz	11.269 kHz
std::sync::Mutex	769.754 kHz	767.714 kHz	23.908 kHz
pthread_mutex_t	982.966 kHz	989.991 kHz	31.816 kHz

Running with 18 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	268.530 kHz	268.355 kHz	5.586 kHz
parking_lot::Mutex (master)	82.786 kHz	82.975 kHz	2.435 kHz
std::sync::Mutex	389.199 kHz	394.549 kHz	21.358 kHz
pthread_mutex_t	482.005 kHz	489.225 kHz	26.404 kHz

Running with 27 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	185.802 kHz	186.233 kHz	2.598 kHz
parking_lot::Mutex (master)	28.246 kHz	28.306 kHz	0.443 kHz
std::sync::Mutex	280.553 kHz	280.014 kHz	10.115 kHz
pthread_mutex_t	311.815 kHz	311.582 kHz	8.409 kHz

Running with 36 threads

name	average	median	std.dev.
parking_lot::Mutex (this pr)	134.010 kHz	133.784 kHz	1.509 kHz
parking_lot::Mutex (master)	22.055 kHz	22.059 kHz	0.150 kHz
std::sync::Mutex	193.672 kHz	195.078 kHz	10.369 kHz
pthread_mutex_t	224.436 kHz	225.767 kHz	12.334 kHz

Running cargo run --bin rwlock --release -- 36 9 5 5 2 2

parking_lot::RwLock (this pr) - [write] 6121.347 kHz [read] 968.373 kHz
parking_lot::RwLock (master) - [write] 628.062 kHz [read] 954.938 kHz
seqlock::SeqLock - [write] 648.979 kHz [read] 152225.000 kHz
pthread_rwlock_t - [write] 1678.253 kHz [read] 376.558 kHz

Amanieu · 2025-05-07T10:23:23Z

-        {
+        let mut prev = self.state.load(Ordering::Relaxed);
+        let new_state = prev & !LOCKED_BIT;
+        prev = self.state.swap(new_state, Ordering::Release);


There's a bug here: you may "forget" a parked thread if another thread sets PARKED_BIT between the load and swap.

Then prev must be set to PARKED_BIT | LOCKED_BIT at L104 and can't pass the check at L105.

BusyJay · 2025-05-07T13:56:31Z

Bench with the command in #418 cargo run --release 32 2 10000 100, regression seems resolved:

std::sync::Mutex avg 30.795793ms min 28.369313ms max 33.668656ms
parking_lot::Mutex (this PR) avg 40.800542ms min 37.16543ms max 44.621677ms
parking_lot::Mutex (master) avg 206.836045ms min 183.902676ms max 213.697023ms
spin::Mutex avg 63.898884ms min 58.45244ms max 74.323676ms
AmdSpinlock avg 70.131547ms min 65.356139ms max 83.456119ms

std::sync::Mutex avg 30.52266ms min 28.69828ms max 34.945486ms
parking_lot::Mutex (this PR) avg 41.146074ms min 38.453175ms max 42.433051ms
parking_lot::Mutex (master) avg 210.387478ms min 187.38791ms max 215.752182ms
spin::Mutex avg 62.823716ms min 54.801191ms max 74.31628ms
AmdSpinlock avg 68.937325ms min 55.406785ms max 80.83359ms

This is an alternative implementation of idea Amanieu#461. Compared to Amanieu#461, this PR maintains parked bit on waiter side, so that waker doesn't have to atomic operation twice. And waker now reset all lock states back to 0 no matter what state it was. This makes fast lock more likely succeed during high contention. Signed-off-by: Jay <BusyJay@users.noreply.github.com>

This is an alternative more aggressive implementation of idea Amanieu#461. Compared to Amanieu#461, this PR - maintains parked bit on waiter side, so that waker doesn't have to atomic operation twice. - reset all lock states back to 0 when unlock. This makes fast lock more likely succeed during high contention. - set PARKED_BIT even waiter is prevented from sleep, so that more threads can be woken up during contention to compete for progress. Signed-off-by: Jay <BusyJay@users.noreply.github.com>

BusyJay added 2 commits May 7, 2025 09:55

Revert "fast unlock in contention"

80d0746

This reverts commit d43aee1. Signed-off-by: Jay <BusyJay@users.noreply.github.com>

reimplement by maintaining parked bit in waker

3cb7712

Signed-off-by: Jay <BusyJay@users.noreply.github.com>

Amanieu reviewed May 7, 2025

View reviewed changes

BusyJay mentioned this pull request May 8, 2025

even faster unlock in contention #462

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast unlock in contention#461

fast unlock in contention#461
BusyJay wants to merge 3 commits intoAmanieu:masterfrom
BusyJay:fix-contention-regression

BusyJay commented May 7, 2025

Uh oh!

BusyJay commented May 7, 2025 •

edited

Loading

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

Amanieu May 7, 2025

Uh oh!

BusyJay May 7, 2025 •

edited

Loading

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BusyJay commented May 7, 2025

Uh oh!

BusyJay commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

Amanieu May 7, 2025

Choose a reason for hiding this comment

Uh oh!

BusyJay May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BusyJay commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

BusyJay commented May 7, 2025 •

edited

Loading

BusyJay May 7, 2025 •

edited

Loading